#pandas dataframe operations explained

Explore tagged Tumblr posts

Visit Tumblr Blog

Explore Tumblr blogs with no restrictions, modern design and the best experience.

Last Seen Tumblr Blogs

fay234761

Untitled

2 posts

chptla

Sin título

9 posts

mipeltaja

Painting the Horizon

18K posts

cupidsbloodybow

𝒮 ♡

9 posts

hades-in-bloom

last two brain cells

117 posts

Fun Fact

Tumblr posted its first advertisements in May 2012 and subsequently earned $13M in revenue.

codewithnazam · 1 year ago

Text

Data Manipulation: A Beginner's Guide to Pandas Dataframe Operations

Outline: What’s a Pandas Dataframe? (Think Spreadsheet on Steroids!) Say Goodbye to Messy Data: Pandas Tames the Beast Rows, Columns, and More: Navigating the Dataframe Landscape Mastering the Magic: Essential Dataframe Operations Selection Superpower: Picking the Data You Need Grab Specific Columns: Like Picking Out Your Favorite Colors Filter Rows with Precision: Finding Just the Right…

View On WordPress

#data manipulation #grouping data #how to manipulate data in pandas #pandas dataframe operations explained #pandas filtering #pandas operations #pandas sorting

0 notes

mysticpandakid · 3 months ago

Text

What is PySpark? A Beginner’s Guide

Introduction

The digital era gives rise to continuous expansion in data production activities. Organizations and businesses need processing systems with enhanced capabilities to process large data amounts efficiently. Large datasets receive poor scalability together with slow processing speed and limited adaptability from conventional data processing tools. PySpark functions as the data processing solution that brings transformation to operations.

The Python Application Programming Interface called PySpark serves as the distributed computing framework of Apache Spark for fast processing of large data volumes. The platform offers a pleasant interface for users to operate analytics on big data together with real-time search and machine learning operations. Data engineering professionals along with analysts and scientists prefer PySpark because the platform combines Python's flexibility with Apache Spark's processing functions.

The guide introduces the essential aspects of PySpark while discussing its fundamental elements as well as explaining operational guidelines and hands-on usage. The article illustrates the operation of PySpark through concrete examples and predicted outputs to help viewers understand its functionality better.

What is PySpark?

PySpark is an interface that allows users to work with Apache Spark using Python. Apache Spark is a distributed computing framework that processes large datasets in parallel across multiple machines, making it extremely efficient for handling big data. PySpark enables users to leverage Spark’s capabilities while using Python’s simple and intuitive syntax.

There are several reasons why PySpark is widely used in the industry. First, it is highly scalable, meaning it can handle massive amounts of data efficiently by distributing the workload across multiple nodes in a cluster. Second, it is incredibly fast, as it performs in-memory computation, making it significantly faster than traditional Hadoop-based systems. Third, PySpark supports Python libraries such as Pandas, NumPy, and Scikit-learn, making it an excellent choice for machine learning and data analysis. Additionally, it is flexible, as it can run on Hadoop, Kubernetes, cloud platforms, or even as a standalone cluster.

Core Components of PySpark

PySpark consists of several core components that provide different functionalities for working with big data:

RDD (Resilient Distributed Dataset) – The fundamental unit of PySpark that enables distributed data processing. It is fault-tolerant and can be partitioned across multiple nodes for parallel execution.

DataFrame API – A more optimized and user-friendly way to work with structured data, similar to Pandas DataFrames.

Spark SQL – Allows users to query structured data using SQL syntax, making data analysis more intuitive.

Spark MLlib – A machine learning library that provides various ML algorithms for large-scale data processing.

Spark Streaming – Enables real-time data processing from sources like Kafka, Flume, and socket streams.

How PySpark Works

1. Creating a Spark Session

To interact with Spark, you need to start a Spark session.

Output:

2. Loading Data in PySpark

PySpark can read data from multiple formats, such as CSV, JSON, and Parquet.

Expected Output (Sample Data from CSV):

3. Performing Transformations

PySpark supports various transformations, such as filtering, grouping, and aggregating data. Here’s an example of filtering data based on a condition.

Output:

4. Running SQL Queries in PySpark

PySpark provides Spark SQL, which allows you to run SQL-like queries on DataFrames.

Output:

5. Creating a DataFrame Manually

You can also create a PySpark DataFrame manually using Python lists.

Output:

Use Cases of PySpark

PySpark is widely used in various domains due to its scalability and speed. Some of the most common applications include:

Big Data Analytics – Used in finance, healthcare, and e-commerce for analyzing massive datasets.

ETL Pipelines – Cleans and processes raw data before storing it in a data warehouse.

Machine Learning at Scale – Uses MLlib for training and deploying machine learning models on large datasets.

Real-Time Data Processing – Used in log monitoring, fraud detection, and predictive analytics.

Recommendation Systems – Helps platforms like Netflix and Amazon offer personalized recommendations to users.

Advantages of PySpark

There are several reasons why PySpark is a preferred tool for big data processing. First, it is easy to learn, as it uses Python’s simple and intuitive syntax. Second, it processes data faster due to its in-memory computation. Third, PySpark is fault-tolerant, meaning it can automatically recover from failures. Lastly, it is interoperable and can work with multiple big data platforms, cloud services, and databases.

Getting Started with PySpark

Installing PySpark

You can install PySpark using pip with the following command:

To use PySpark in a Jupyter Notebook, install Jupyter as well:

To start PySpark in a Jupyter Notebook, create a Spark session:

Conclusion

PySpark is an incredibly powerful tool for handling big data analytics, machine learning, and real-time processing. It offers scalability, speed, and flexibility, making it a top choice for data engineers and data scientists. Whether you're working with structured data, large-scale machine learning models, or real-time data streams, PySpark provides an efficient solution.

With its integration with Python libraries and support for distributed computing, PySpark is widely used in modern big data applications. If you’re looking to process massive datasets efficiently, learning PySpark is a great step forward.

youtube

#pyspark training #pyspark coutse #apache spark training #apahe spark certification #spark course #learn apache spark #apache spark course #pyspark certification #hadoop spark certification .#Youtube

0 notes

anandshivam2411 · 8 months ago

Text

Overview of Pandas vs. NumPy

Pandas and NumPy are two important tools in Python for working with data. While they may seem similar at first, they have different purposes and special features that make them helpful for various tasks.

NumPy is mainly used for handling numbers. It helps you work with large groups of numbers, like lists and arrays. With many built-in math functions, NumPy is great for doing complex calculations quickly and easily. This makes it popular among scientists, engineers, and anyone who needs to perform math operations on data. If you are doing tasks that require fast calculations, NumPy is the library to use.

On the other hand, Pandas is focused on data analysis and organization. It offers simple tools like Series and DataFrames, which let you work with organized data without much trouble. Pandas is excellent for cleaning, changing, and exploring data, especially when dealing with messy or incomplete information. You can easily filter, group, and visualize data, making it a favorite among data analysts and researchers.

You can use both libraries together to improve how you work with data. While NumPy provides speed and efficiency for calculations, Pandas gives you the tools to manage and analyze data well.

I recently read a blog that explains everything about Pandas and NumPy in an easy-to-understand way. I think everyone should check it out to learn how these libraries can help with data work.

#DataScienceTools #PandasAndNumPy #PythonForData #DataAnalysisMadeEasy #NumPyLibrary #PandasLibrary #DataHandlingInPython #DataScienceEssentials #PythonDataAnalysis #LearnDataScience #DataScienceForBeginners #NumPyForCalculations #PandasForAnalysis #PythonLibraries

1 note · View note

sanjanabia · 11 months ago

Text

Big Data vs. Traditional Data: Understanding the Differences and When to Use Python

In the evolving landscape of data science, understanding the nuances between big data and traditional data is crucial. Both play pivotal roles in analytics, but their characteristics, processing methods, and use cases differ significantly. Python, a powerful and versatile programming language, has become an indispensable tool for handling both types of data. This blog will explore the differences between big data and traditional data and explain when to use Python, emphasizing the importance of enrolling in a data science training program to master these skills.

What is Traditional Data?

Traditional data refers to structured data typically stored in relational databases and managed using SQL (Structured Query Language). This data is often transactional and includes records such as sales transactions, customer information, and inventory levels.

Characteristics of Traditional Data:

Structured Format: Traditional data is organized in a structured format, usually in rows and columns within relational databases.

Manageable Volume: The volume of traditional data is relatively small and manageable, often ranging from gigabytes to terabytes.

Fixed Schema: The schema, or structure, of traditional data is predefined and consistent, making it easy to query and analyze.

Use Cases of Traditional Data:

Transaction Processing: Traditional data is used for transaction processing in industries like finance and retail, where accurate and reliable records are essential.

Customer Relationship Management (CRM): Businesses use traditional data to manage customer relationships, track interactions, and analyze customer behavior.

Inventory Management: Traditional data is used to monitor and manage inventory levels, ensuring optimal stock levels and efficient supply chain operations.

What is Big Data?

Big data refers to extremely large and complex datasets that cannot be managed and processed using traditional database systems. It encompasses structured, unstructured, and semi-structured data from various sources, including social media, sensors, and log files.

Characteristics of Big Data:

Volume: Big data involves vast amounts of data, often measured in petabytes or exabytes.

Velocity: Big data is generated at high speed, requiring real-time or near-real-time processing.

Variety: Big data comes in diverse formats, including text, images, videos, and sensor data.

Veracity: Big data can be noisy and uncertain, requiring advanced techniques to ensure data quality and accuracy.

Use Cases of Big Data:

Predictive Analytics: Big data is used for predictive analytics in fields like healthcare, finance, and marketing, where it helps forecast trends and behaviors.

IoT (Internet of Things): Big data from IoT devices is used to monitor and analyze physical systems, such as smart cities, industrial machines, and connected vehicles.

Social Media Analysis: Big data from social media platforms is analyzed to understand user sentiments, trends, and behavior patterns.

Python: The Versatile Tool for Data Science

Python has emerged as the go-to programming language for data science due to its simplicity, versatility, and robust ecosystem of libraries and frameworks. Whether dealing with traditional data or big data, Python provides powerful tools and techniques to analyze and visualize data effectively.

Python for Traditional Data:

Pandas: The Pandas library in Python is ideal for handling traditional data. It offers data structures like DataFrames that facilitate easy manipulation, analysis, and visualization of structured data.

SQLAlchemy: Python's SQLAlchemy library provides a powerful toolkit for working with relational databases, allowing seamless integration with SQL databases for querying and data manipulation.

Python for Big Data:

PySpark: PySpark, the Python API for Apache Spark, is designed for big data processing. It enables distributed computing and parallel processing, making it suitable for handling large-scale datasets.

Dask: Dask is a flexible parallel computing library in Python that scales from single machines to large clusters, making it an excellent choice for big data analytics.

When to Use Python for Data Science

Understanding when to use Python for different types of data is crucial for effective data analysis and decision-making.

Traditional Data:

Business Analytics: Use Python for traditional data analytics in business scenarios, such as sales forecasting, customer segmentation, and financial analysis. Python's libraries, like Pandas and Matplotlib, offer comprehensive tools for these tasks.

Data Cleaning and Transformation: Python is highly effective for data cleaning and transformation, ensuring that traditional data is accurate, consistent, and ready for analysis.

Big Data:

Real-Time Analytics: When dealing with real-time data streams from IoT devices or social media platforms, Python's integration with big data frameworks like Apache Spark enables efficient processing and analysis.

Large-Scale Machine Learning: For large-scale machine learning projects, Python's compatibility with libraries like TensorFlow and PyTorch, combined with big data processing tools, makes it an ideal choice.

The Importance of Data Science Training Programs

To effectively navigate the complexities of both traditional data and big data, it is essential to acquire the right skills and knowledge. Data science training programs provide comprehensive education and hands-on experience in data science tools and techniques.

Comprehensive Curriculum: Data science training programs cover a wide range of topics, including data analysis, machine learning, big data processing, and data visualization, ensuring a well-rounded education.

Practical Experience: These programs emphasize practical learning through projects and case studies, allowing students to apply theoretical knowledge to real-world scenarios.

Expert Guidance: Experienced instructors and industry mentors offer valuable insights and support, helping students master the complexities of data science.

Career Opportunities: Graduates of data science training programs are in high demand across various industries, with opportunities to work on innovative projects and drive data-driven decision-making.

Conclusion

Understanding the differences between big data and traditional data is fundamental for any aspiring data scientist. While traditional data is structured, manageable, and used for transaction processing, big data is vast, varied, and requires advanced tools for real-time processing and analysis. Python, with its robust ecosystem of libraries and frameworks, is an indispensable tool for handling both types of data effectively.

Enrolling in a data science training program equips you with the skills and knowledge needed to navigate the complexities of data science. Whether you're working with traditional data or big data, mastering Python and other data science tools will enable you to extract valuable insights and drive innovation in your field. Start your journey today and unlock the potential of data science with a comprehensive training program.

#Big Data #Traditional Data #Data Science #Python Programming #Data Analysis #Machine Learning #Predictive Analytics #Data Science Training Program #SQL #Data Visualization #Business Analytics #Real-Time Analytics #IoT Data #Data Transformation

0 notes

360digitmg-bangalore · 3 years ago

Text

Turn Into A Data Scientist Step-by-step Information To Turn Out To Be An Data Scientist

Incorporating data science strategies in operations in the coming years, anticipate the potential for issues, and develop strategies primarily based on information to attain success. I urge you to see this Data Science video tutorial that explains what's Data Science and all that we have discussed in the weblog. This information has lots of inconsistencies like lacking values, blank columns, abrupt values and incorrect knowledge format which must be cleaned.

visit to know more about :data scientist course in hyderabad These chatbots are excellent applications and are used across different sectors, including hospitality, banking, retail, and publishing. Enroll in this Machine Learning Course for extra in-depth studying. So, all it takes is one line of code, and we're able to extract all those records where the age of the individual is exactly 50. Now, simply imagine, should you needed to manually go through each of the 32,561 records to check the age of the person!! Thank God that we are in a position to manipulate knowledge with only a single line of code. Now, let’s go forward and understand every of these intimately. Data Scientists need knowledge of statistics and programming. You might be happy to know that 360DigiTMG offers probably the greatest Data Science programs within the nation that can help you find out about Data Scientists and the instruments and methods utilized by them. You may even participate in many hands-on tasks to learn how to cope with industry-specific solutions. This specialization, created by Johns Hopkins University, is comprised of 10 programs and is supposed to cover the whole gamut. It focuses not only on information analysis, but in addition on the delicate expertise wanted to be a data scientist—like making inferences and asking the proper questions. For machine learning in Python, you should learn to use the scikit-learn library. If you are already an intermediate pandas person, you could want to be taught my high 25 pandas tricks, learn about finest practices with pandas, or take my online pandas course. If you are fascinated within the exciting world of data science, however do not know the place to begin, Data School is here to help. Thank you a lot for taking your treasured time to learn this weblog. Video and pc games are now being created with the help of information science and that has taken the gaming experience to the subsequent stage. Healthcare companies are using knowledge science to construct refined medical devices to detect and treatment diseases. Data Science has also made inroads into the transportation trade, such as with driverless automobiles. The most common algorithm used for sample discovery is Clustering. Although nothing can exchange an in-depth understanding of a variety of models, I created a comparability chart of supervised learning fashions that will function a helpful reference guide. Pandas offers a high-performance information structure (called a "DataFrame") that's suitable for tabular data with columns of various types, just like an Excel spreadsheet or SQL desk. A background in statistics is especially useful for understanding statistical distributions, estimators and checks. The outcomes of statistical findings are generally required by corporations to find a way to make knowledgeable decisions. You can learn information science by yourself with on-line courses or even YouTube movies. There isn't any dearth of studying supplies on the Internet if you’re working towards a career on this area. Ultimately, the choice between Python and R comes all the way down to your career targets. The average salary of Data Scientists within the United States is US$120,000, and the typical wage in India is ₹1,000,000 per yr. These self-driving automobiles are the future of the automotive business. Chatbots are basically automated bots, which respond to all our queries. Having a solid understanding of today’s important programming languages might help a data skilled stand out within the job subject. Those new to the field should acquire a radical understanding of core knowledge analytics tools like NumPy, Pandas and Matplotlib. You may additionally consider studying particular libraries for interacting with web data similar to Requests and BeautifulSoup. Use data visualization and storytelling to convey findings to numerous stakeholders. Here are some transient overviews of a couple of use circumstances, showing information science’s versatility. When the data has been utterly rendered, the info scientist interprets the information to search out opportunities and options. Data Science is hailed as the most popular technology of the twenty first century. The major purpose for its reputation is that it allows data-driven choice making in real-world eventualities. You will analyze varied studying techniques like classification, association and clustering to construct the mannequin. Can perform in-database analytics utilizing frequent data mining functions and basic predictive models. Google's Python Class is finest for people with some programming expertise, and contains lecture videos and downloadable workouts. Teamwork plays an essential role whereas delivering the end result to the companies, firms we are working as information scientists. This powerful coding language is not solely useful to internet builders, and you simply might end up able the place you could need them in an expert information setting. Is considered by many to be the scripting language of the web. Data Science applications provide a greater stage of therapeutic customisation via genetics and genomics analysis. Tech companies that acquire consumer data can utilise strategies to rework that knowledge into valuable or worthwhile information. The information used for analysis can come from many different sources and presented in various codecs. Some degree of programming is required to execute a successful data science project. Python is especially popular because it’s easy to be taught, and it supports a number of libraries for knowledge science and ML. Data science is an important part of many industries today, given the huge quantities of knowledge which would possibly be produced, and is certainly one of the most debated subjects in IT circles. Its recognition has grown through the years, and corporations have started implementing information science methods to grow their enterprise and increase buyer satisfaction.

For more information: data scientist course in hyderabad

360DigiTMG - Data Analytics, Data Science Course Training Hyderabad

Address - 2-56/2/19, 3rd floor,, Vijaya towers, near Meridian school,, Ayyappa Society Rd, Madhapur,, Hyderabad, Telangana 500081

099899 94319

Visit on map : https://g.page/Best-Data-Science

0 notes

sparkbyexamples · 4 years ago

Text

How to Convert Pandas to PySpark DataFrame

While working with a huge dataset, Pandas are not good enough to perform complex transformation operations hence if you have a Spark cluster, it’s better to convert Pandas to PySpark DataFrame, apply the complex transformations on Spark cluster, and convert it back. In this article, I will explain steps in converting Pandas to PySpark DataFrame and how to Optimize the Pandas to PySpark DataFrame…

View On WordPress

#Pandas

0 notes

siva3155 · 6 years ago

Text

350+ TOP PYTHON Interview Questions and Answers

PYTHON Interview Questions for freshers & experienced :-

1) What Is Python? Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. Python combines remarkable power with very clear syntax. It has interfaces to many system calls and libraries, as well as to various window systems, and is extensible in C or C++. It is also usable as an extension language for applications that need a programmable interface. Finally, Python is portable: it runs on many Unix variants, on the Mac, and on PCs under MS-DOS, Windows, Windows NT, and OS/2. 2) What are the different ways to create an empty NumPy array in python? There are two methods we can apply to create empty NumPy arrays. The first method. import numpy numpy.array() The second method. # Make an empty NumPy array numpy.empty(shape=(0,0)) 3) Can’t concat bytes to str? This is providing to be a rough transition to python on here f = open( ‘myfile’, ‘a+’ ) f.write(‘test string’ + ‘\n’) key = “pass:hello” plaintext = subprocess.check_output() print (plaintext) f.write (plaintext + ‘\n’) f.close() The output file looks like: test string 4) Expline different way to trigger/ raise exception in your python script? Raise used to manually raise an exception general-form: raise exception-name (“message to be conveyed”). voting_age = 15 if voting_age output: ValueError: voting age should be at least 19 and above 2.assert statements are used to tell your program to test that condition attached to assert keyword, and trigger an exception whenever the condition becomes false. Eg: a = -10 assert a > 0 #to raise an exception whenever a is a negative number Output: AssertionError Another way of raising an exception can be done by making a programming mistake, but that is not usually a good way of triggering an exception 5) Why is not__getattr__invoked when attr==’__str__’? The base class object already implements a default __str__ method, and __getattr__function is called for missing attributes. The example as it we must use the __getattribute__ method instead, but beware of the dangers. class GetAttr(object): def __getattribute__(self, attr): print(‘getattr: ‘ + attr) if attr == ‘__str__’: return lambda: ‘’ else: return lambda *args: None A better and more readable solution to simply override the __str__ method explicitly. class GetAttr(object): def __getattr__(self, attr): print(‘getattr: ‘ + attr) return lambda *args: None def __str__(self): return ‘’ 6)What do you mean by list comprehension? The process of creating a list performing some operation on the data so that can be accessed using an iterator is referred to as list comprehension. EX: Output: 65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90 7) What will be the output of the code:def foo (i=)? i.append (1) return i >>> foo () >>> foo () Output: The argument to the function foo is evaluated once when the function is defined However since it is a list on every all the list is modified by appending a 1 to it. 8) How to Tic tac toe computer move? Below The code of computer move in the game tic tac toe in python def computermove(board,computer,human): movecom=” rmoves=rd(0,8) for movecom in legalmoves(board): board=computer if winner(board)==computer: return movecom board=” for movecom in legalmoves(board): board=human if winner(board)==human: return movecom board=” while rmoves not in legalmoves(board): rtmoves=rd(0,8) return rmoves 9) Explain about ODBC and python? ODBC (Open Database Connectivity) API standard allows the connections with any database that supports the interface such as the PostgreSL database or Microsoft access in a transparent manner Three types of ODBC modules for python: PythonWin ODBC module – limited development mxODBC – a commercial product pyodbc – This is open source python package 10) How to implement the decorator function, using dollar ()? Code: def dollar(fn): def new(*args): return ‘$’ + str(fn(*args)) return new @dollar def price(amount, tax_rate): return amount + amount*tax_rate print price(100,0.1) output: $110

PYTHON Interview Questions 11) How to count the number of instance? You have a class A, you want to count the number of A instance. Hint: use staticmethod Example class A: total = 0 def __init__(self, name): self.name = name A.total += 1 def status(): print “Number of instance (A) : “, A.total status = staticmethod(status) a1 = A(“A1”) a2 = A(“A2”) a3 = A(“A3”) a4 = A(“A4”) A.status() Output: The number of instance (A) : 4 12) What are the Arithmetic Operators that Python supports? ‘+’ : Addition ‘-’ : Subtraction ‘*’ : Multiplication ‘/’: Division ‘%’: Modulo division ‘**’: Power Of ‘//’: floor div Python does not support unary operators like ++ or – operators. Python supports “Augmented Assignment Operators”. i.e., A += 10 Means A = A+10 B -= 10 Means B = B-10 13) How do you reload a Python module? All that needs to be a module object to the imp.reload() function or just reload() in Python 2.x, and the module will be reloaded from its source file. Any other code references symbols exported by the reloaded module, they still are bound to the original code. 14) How does Python handle Compile-time and Run-time code checking? Python supports compile-time code checking up to some extent. Most checks for variable data types will be postponed until run-time code checking. When an undefined custom function is used, it will move forward with compile-time checking. During runtime, Python raises exceptions against errors. 15) What are Supporting Python packages for data science operations? Pandas: A package providing flexible data structures to work with relational or labeled data. NumPy: A package that allows working with numerical based data structures like arrays and tensors. Matplotlib: A 2D rendering engine written for Python. Tensorflow: A package used for constructing computational graphs. 16) What are the ones that can be used with pandas? A python dict, ndarray or scalar values can be used with Pandas. The passed index is a list of axis labels. 17) How To Add an Index, Row or Column to a Pandas DataFrame? The index can be added by calling set_index() on programmer DataFrame. For accessing rows, loc works on labels of programme index, iloc works on the positions in programme index, it is a more complex case: when the index is integer-based, programmer passes a label to ix. 18) How To Create an Empty DataFrame? The function that programmer will use is the Pandas Dataframe() function: it reuires the programmer to pass the data that programmer wants to put in, the indices and the columns. 19) Does Pandas Recognize Dates When Importing Data? Yes. but programmer needs to help it a tiny bit: add the argument parse_dates when programmer by reading in data from, let is say, a comma-separated value (CSV) file. 20) How to convert a NumPy array to a Python List? Use tolist(): import numpy as np >>> np.array(,]).tolist() , ] 21) How to set the figure title and axes labels font size in Matplotlib? Functions dealing with text like label, title, etc. accept parameters same as matplotlib.text.Text. For the font size you can use size/fontsize: 39) What is dictionary in Python? The built-in datatypes in Python are called a dictionary. It defines one-to-one Relationship between keys and values. It contains a pair of keys and their corresponding values. Dictionaries are indexed by keys. It is a collection which is unordered, changeable and indexed. Let’s take an example: The following example contains some keys. State, Capital,Language. Their corresponding values are Karnataka, Bangalore, and Kannada respectively. Dict={ ‘Country’:’Karnataka’,’Capital’:’Bangalore’,’Launguage’:’Kannada’} print dict Karnataka Print dict Bangalore Print dict Kannada 40) How memory is managed in Python? Python private heap space manages python memory. Python heap has all Python objects and data structures. Access to this private heap is restricted to programmer also Python private heap is taken care by the interpreter. The core API gives access to some tools for the programmer to code. Python memory manager allocates python heap space. 41)What is the output of this following statement? f=none for i in range(5); with open(“data.txt”, ”w”) as f: if I>1: break print f.closed A) True B) False C) None D) Error Ans: A 42) Write a coding in Find a Largest Among three numbers? num1 = 10 num2 = 14 num3 = 12 if (num1 >= num2) and (num1 >= num3): largest = num1 elif (num2 >= num1) and (num2 >= num3): largest = num2 else: largest = num3 print(“The largest number between”,num1,”,”,num2,”and”,num3,”is”,largest) Output: The largest Number is 14.0 43) What is Lambda in Python? lambda is an one line anonymous function, Example: Sum=lambda i,c:i+c 44) What is the difference between list and tuples? Lists are the mutable elements where we can able to perform the task in the existed variable. Lists can able to reduce the utilization of memory Tuples are immutable so it can execute faster when compared with list. But it will wastes the memory. 45) What are the key features of Python? The python doesn’t have any header files It doesn’t have any structure or syntax except the indentation. It can execute the instructions fastly because of the RISC architecture. It consumes only less memory because of no internal executions. It doesn’t have any compilers compilation can be done at the time of the program. 46) How to delete a file in Python? In Python, Delete a file using this command, os.unlink(filename) or os.remove (filename) 47) What is the usage of help() and dir() function in Python? Help() and dir() both functions are accessible from the Python interpreter used for viewing a consolidated dump of built-in functions. Help() function: The help() function is used to display the documentation string and also facilitates you to see the help related to modules, keywords, attributes, etc. 48) Which of the following statements create a dictionary? (Multiple Correct Answers Possible) a) d = {} b) d = {“john”:40, “peter”:45} c) d = {40:”john”, 45:”peter”} d) d = (40:”john”, 45:”50”) Ans: All of the above 49) Which of the following is an invalid statement? a) abc = 1,000,000 b) a b c = 1000 2000 3000 c) a,b,c = 1000, 2000, 3000 d) a_b_c = 1,000,000 Ans: c 50) What is the output of the following? try: if ‘1’ != 1: raise “someError” else: print(“someError has not occured”) except “someError”: print (“someError has occured”) a) someError has occured b) someError has not occured c) invalid code d) none of the above Ans: b 51) What is the maximum possible length of an identifier? a) 31 characters b) 63 characters c) 79 characters d) None of the above Ans: d 52) Differentiate list and tuple with an example? difference is that a list is mutable, but a tuple is immutable. Example: >>> mylist= >>> mylist=2 >>> mytuple=(1,3,3) >>> mytuple=2 TypeError: ‘tuple’ object does not support item assignment 53) Which operator will be helpful for decision making statements? comparison operator 54) Out of two options which is the template by default flask is following? a) Werkzeug b) Jinja2 Ans : b 55) Point out the use of help() function Help on function copy in module copy: copy(x) Shallow copy operation on arbitrary Python objects. 56) From below select which data structure is having key-value pair ? a.List b.Tuples c.Dictionary Ans : c 57) Differentiate *args and **kwargs? *args : We can pass multiple arguments we want like list or tuples of data **kwargs : we can pass multiple arguments using keywords 58) Use of Negative indices? It helps to slice from the back mylist= >>>mylist 6 59) Give an example for join() and split() funcitons >>> ‘,’.join(‘12345’) ‘1,2,3,4,5’ >>> ‘1,2,3,4,5’.split(‘,’) 60) Python is case sensitive ? a.True b.False Ans : a 61) List out loop breaking functions break continue pass 62) what is the syntax for exponentiation and give example? a**b 2**3 = 8 63) Which operator helps to do addition operations ? arithmetic operator 64) How to get all keys from dictionary ? dictionary_var.keys() 65) Give one example for multiple statements in single statement? a=b=c=3 66) What is the output for the following code? >> def expandlist(val, list=): list.append(val) return list >>> list1 = expandlist (10) >>> list2 = expandlist (123,) >>> list3 = expandlist (‘a’) >>> list1,list2,list3 Ans : (, , ) 67) Number of argument’s that range() function can take ? 3 68) Give an example to capital first letter of a string? a=’test’ print a.upper() Test 69) How to find whether string is alphanumeric or not? str = “hjsh#”; print str.isalnum() Ans :False 70) Which method will be used to delete a file ? os.remove(filename) 71) What is difference between match & search in regex module in python? Match Checks for a match only at the beginning of the string, while search checks for a match anywhere in the string. 72) Can we change tuple values? If yes, give an example. Since tuple are immutable, so we cannot change tuple value in its original form but we can convert it into list for changing its values and then convert again to tuple. Below is the example: my_tuple=(1,2,3,4) my_list=list(my_tuple) my_list=9 my_tuple=tuple(my_list) 73) What is purpose of __init__ in Class ? Is it necessary to use __init__ while creating a class ? __init__ is a class contructor in python. __init__ is called when we create an object for a class and it is used to initialize the attribute of that class. eg : def __init__ (self, name ,branch , year) self.name= name self.branch = branch self.year =year print(“a new student”) No, It is not necessary to include __init__ as your first function every time in class. 74) Can Dictionary have a duplicate keys ? Python Doesn’t allow duplicate key however if a key is duplicated the second key-value pair will overwrite the first as a dictionary can only have one value per key. For eg : >>> my_dict={‘a’:1 ,’b’ :2 ,’b’:3} >>> print(my_dict) {‘a’: 1, ‘b’: 3} 75) What happened if we call a key that is not present in dictionary and how to tackle that kind of error ? It will return a Key Error . We can use get method to avoid such condition. This method returns the value for the given key, if it is present in the dictionary and if it is not present it will return None (if get() is used with only one argument). Dict.get(key, default=None) 76) What is difference b/w range and arange function in python? numpy.arange : Return evenly spaced values within a given interval. Values are generated within the half-open interval stop, dtype=None) Range : The range function returns a list of numbers between the two arguments (or one) you pass it. 77) What is difference b/w panda series and dictionary in python? Dictionaries are python’s default data structures which allow you to store key: value pairs and it offers some built-in methods to manipulate your data. 78) Why it need to be create a virtual environment before staring an project in Django ? A Virtual Environment is an isolated working copy of Python which allows you to work on a specific project without worry of affecting other projects. Benefit of creating virtualenv : We can create multiple virtualenv , so that every project have a different set of packages . For eg. if one project we run on two different version of Django , virtualenv can keep thos projects fully separate to satisfy both reuirements at once.It makes easy for us to release our project with its own dependent modules. 79) How to write a text from from another text file in python ? Below is the code for the same. import os os.getcwd() os.chdir(‘/Users/username/Documents’) file = open(‘input.txt’ ,’w’) with open(“output.txt”, “w”) as fw, open(“input.txt”,”r”) as fr: 80) what is difference between input and raw_input? There is no raw_input() in python 3.x only input() exists. Actually, the old raw_input() has been renamed to input(), and the old input() is gone, but can easily be simulated by using eval(input()). In python 3.x We can manually compile and then eval for getting old functionality. python2.x python3.x raw_input() input() input() eval(input()) 81) What are all important modules in python reuired for a Data Science ? Below are important module for a Data Science : NumPy SciPy Pandas Matplotlib Seaborn Bokeh Plotly SciKit-Learn Theano TensorFlow Keras 82) What is use of list comprehension ? List comprehensions is used to transform one list into another list. During this process, list items are conditionally included in the new list and each items are transformed as reuired. Eg. my_list= my_list1= Using “for “ loop : for i in my_list1: my_list.append(i*2) Using List comprehension : my_list2= print(my_list2) 83) What is lambda function ? lambda function is used for creating small, one-time and anonymous function objects in Python. 84) what is use of set in python? A set is a type of python data Structure which is unordered and unindexed. It is declared in curly braces . sets are used when you reuired only uniue elements .my_set={ a ,b ,c,d} 85) Does python has private keyword in python ? how to make any variable private in python ? It does not have private keyword in python and for any instance variable to make it private you can __ prefix in the variable so that it will not be visible to the code outside of the class . Eg . Class A: def __init__(self): self.__num=345 def printNum(self): print self.__num 86) What is pip and when it is used ? it is a package management system and it is used to install many python package. Eg. Django , mysl.connector Syntax : pip install packagename pip install Django : to install Django module 87) What is head and tail method for Data frames in pandas ? Head : it will give the first N rows of Dataframe. Tail : it will give last N rows of Dataframe. By default it is 5. 88) How to change a string in list ? we can use split method to change an existing string into list. s= ‘Hello sam good morning ’ s.split() print(s) 89) How to take hello as output from below nested list using indexing concepting in python. my_list=, 4,5]],3,4] Ans : my_list print(my_list) 90) What is list when we have to use ? Lists always store homogeneous elements. we have to use the lists when the data is same type and when accessing is more insteading of inserting in memory. 91) What is dict when we have to use ? Dict is used to store key value pairs and key is calculated using hash key. This is used when we want to access data in O(1) time as big O notation in average case. Dict I used in u can say super market to know the price of corresponding while doing billing 92) What is tuple when we have to use ? Tuple is hetrogenous and we have to use when data is different types. 93) Is String Immutable ? Yes because it creates object in memory so if you want to change through indexing it will throw an exception since it can’t be changes I,e immutable. 94) How to handle Exception ? We can handle exceptions by using try catch block . we can also else block in python to make it executed based on condition. 95) Will python work multiple inheritance? Yes it works .by seuentially referring parent class one by one. 96) Will class members accessible by instances of class? Yes by referring corresponding attributes we can access. 97) What are Special methods in python and how to implement? Special methods in python are __init__,__str__,__iter__,__del__ __init__-it will initialize when class loads. __str__-It is used to represent object in a string format. __iter__-it I used to define iteration based on reuirements. __del__-It is used to destroy object when it is not reuired for memory optimization. 98) How to handle deadlock in python. By providing synchronization methods so that each thread access one at a time.It will lock another thread until thread fine it execution. 99) How for loop will works in python? For loop internally calls iter method of an object for each call. 100) What is List comprehension how to define it and when to use? List Comprehensions are expression based iteration. So we have to give expression and then provide loop and provide if condition if needed. We have to use when we want to define in such a way that write the code in a compact way. 101) What is set when we have to use? Set is used to define uniue elements without duplicates. So if you have lump of data and we are searching through email record. By using set we can get the uniue elements. 102) How django works ? Django will take an url from frontend and look for url reolvers and url will ap corresponding view and if data to be handled it will use certain model to make any database transactions and give repone via view and then passs to UI. Or django template 103) Is python pure object oriented programming ? Yes in python all types are stored a objects. 104) What are packages in python which are commonly used explain one ? The packages used are os, sys,time,tempfile,pdb, Os –it is used for file and directories handling. Pdb-It is used to debug the code to find the root cause of issue. 105) How will you merge 2 dictionaries in python? a = {1:’1’} , b={2:’2’} c= {**a,**b} 106) What is the other way of checking truthiness? These only test for truthiness: if x or y or z: print(‘passed’) if any((x, y, z)): print(‘passed’) 107) How will you verify different flags at once? flags at once in Python v1,v2,v3 = 0, 1, 0 if v1 == 1 or v2 == 1 or v3 == 1: print(‘passed’) if 1 in (v1, v2, v3): print(‘passed’) 108) What happens when you execute python == PYTHON? You get a Name Error Execution 109) Tool used to check python code standards? Pylint 110) How strings can be sliced? They can be generally treated as arrays without commas. Eg: a = “python” a -> i can be any number within the length of the string 111) How to pass indefinite number of arguments to any function? We use **args when we don’t know the number of arguments to be passed 112) In OOPS what is a diamond problem in inheritance? During multiple inheritance, when class X has two subclasses Y and Z, and a class D has two super classes Y and Z.If a method present in X is overridden by both Y and Z but not by D then from which class D will inherit that method Y or Z. 113) Among LISTS,SETS,TUPLES which is faster? Sets 114) How Type casting is done in python? (Str -> int) s = “1234” # s is string i = int(s) # string converted to int 115) How python maintains conditional blocks? Python used indentation to differentiate and maintain blocks of code 116) Write a small code to explain repr() in python ? Repr gives the format that can be read by the compiler. Eg: y=2333.3 x=str(y) z=repr(y) print ” y :”,y print “str(y) :”,x print “repr(y):”,z ————- output y : 2333.3 str(y) : 2333.3 repr(y) : 2333.3000000000002 117) How to encrypt a string? str_enc = str.encode(‘base64’, ‘strict’) 118) Functions are objects -> Explain ? # can be treated as objects def print_new(val): return val.upper() print ( print_new(‘Hello’)) yell = print_new print yell(‘different string’) 119) Explain the synbtax to split a string in python? Str.split(separator,max_split) 120) How can you identify the data type of any variable in python? Use type(var) 121) What does MAP function in python do? map() returns a list of the results after it applys the function to each item in a iterable data type (list, tuple etc.) 122) What does the enum function in python do? When we need to print the vars index along when you iterate, we use the enum function to serve this purpose. 123) Explain assert in action? assert “py” == “PY”, “Strings are not eual” 124) How does pop function works in set data types? Pop deletes a random element from the set 125) Is Python open source? If so, why it is called so? Python is an open source programming language. Because Python’s source code (the code in which Python software is written) is open for all and anyone can have a look at the source code and edit. 126). Why Python is called portable? Because we can run Python in wide range of hardware platforms and has similar interfaces across all the platforms 127) How to give comments in Python? Using Hashes (#) at the starting of a line 128) How to create prompt in the console window? Using input function 129) How to write multiple statements in a single line in Python? Using semicolon between the statements 130) List out standard datatypes in Python Numbers, string, list, tuple, dictionary 131) Which standard datatype in Python is immutable? tuple 132) What is indexing? Explain with an example Indexing is the numbering of characters in string or items in list, tuple to give reference for them. It starts from 0. Str = “Python”. The index for P is 0, y is 1, t is 2 and goes on. 133).Which statement is used to take a decision based on the comparison? IF statement 134) List out atleast two loop control statements break, continue, pass 135) What is the result of pow(x,y) X raised to the power Y 136) What is the difference between while and for loop? While loops till the condition fails, for loops for all the values in the list of items provided. 137) Which method removes leading and trailing blanks in a string? strip – leading and trialing blanks, lstrip – leading blanks, rstrip – trailing blanks 138) Which method removes and returns last object of a list? list.pop(obj=lst) 139) What is argument in a function? Argument is the variable which is used inside the function. While calling the function we need to provide values to those arguments. 140) What is variable length argument in function? Function having undefined no. of arguments are called variable length argument function. While calling this function, we can provide any no. of arguments 141) What is namespace? Namespace is the dictionary of key-value pairs while key is the variable name and value is the value assigned to that variable. 142) What is module? Module is a file containing python code which can be re-used in a different program if it is a function. 143) Which is the default function in a class? Explain about it – _init_. It is called class contructor or initialization method. Python calls _init_ whenever you create a instance for the class 144) What is docstring? How to define it? docstring is nothing but a comment inside the block of codes. It should be enclosed inside “”” mark. ex: “”” This is a docstring ””” 145) What is the default argument in all the functions inside a class? Self 146) How to send a object and its value to the garbage collection? del objname 147) How to install a package and import? In DOS prompt, run pip install package_name and run import package_name in editor window in Python’s IDE. 148) Name the function which helps to change the files permission os.chmod 149) Which is the most commonly used package for data importing and manipulation? Pandas 150) Will python support object oriented? Yes, it will support by wrapping the code with objects. 151) IS python can be compatible with command prompt? Yes, it can be accessed through command prompt. 152) How Lists is differentiated from Tuples? List are slow, can be edited but Tuples are fast and cannot be edited. 153). Use of NUMPY package? It is fastest, and the package take care of the number calculations. 154). Uses of python? Pie charts, web application, data modeling, automation and Cluster data. 155) Does python interact with Database? Yes, it interfaces to most of the Databases. 156) Is python is intended oriented? Yes, it will throw error if it is not in seuence. 157) How is Garbage handled in python? It will be automatically handle the garbage after the variable is used. 158) How will you check python version? Using python –version. 159) How will you uit the python? Using exit() 160) Does Python has any command to create variable? No, just (x =244) 161) What is complex type in python? It is mixture of variable and number. 162) Casting in python? To make String use command str(2) = ‘2’ 163) What is strip in python? Used to remove white spaces in String 164) Other String literals? Lower, upper, len, split, replace. 165) Python operators? Arithmetic, Assignment, Comparison, Logical, Identity, Membership and Bitwise. 166) Membership operator in python? In and not in. 167) Lambda in python? Can take only one expression but any number of Argument. 168) Dict in python? It is something like key and value pair as Map in java. 169) Does python has classes? In python all are denoted as some classes. 170) Multi threading on python? It is a package in python and it use GIL to run the thread one after the other. But isn’t it being not good to use here. 171) What is python private heap space? It is a inbuild garbage collection like java and this space can be used by the developer. 172) Does python support inheritance? Yes, it supports all forms of inheritance single, multiple, hierarchical and multi-level 173) Benefits of Flask? It is light weight and independent package. Mainly a web micro framework. 174) How dir() function is used in python? The defined symbols are defined here. 175) Will exit method in python de allocate the global namespace? No, it has a specific mechanism which it follows as an individual portion. 176) Has python has monkey patching concept within? Yes of course, it does dynamic transactions during the run time of the program. 177) args vs kwargs? Args – don’t know how many arguments are used. Kwargs- don’t know how many keywords are used. 178) use of isupper keyword in python? This will prompt the upper keyword of any character in a string literal. 179) pickling vs unpickling? If the objects translated from string then it seems to be pickling If the String is dumped to objects then it seems to un picking 180) What is py checker in python? It is tool to uantitatively detects the bugs in source code. 181) What are the packages? NUMPY, SCIPY, MATLAB, etc 182) Pass in Python? IT is a namespace with no character and it can be moved to next object. 183) How is unit test done in python? It is done in form of Unittest. This does major of testing activity. 184) Python documentation is called? DoctString such as AI, Python jobs ,Machine learning and Charts. 185) Convert Sting to number and viceversa in python? Str() for String to number and oct() for number to string. 186) Local vs Global in python? Anything inside the function body is local and outside is global as simple as that. 187) How to run script in python? Use py command or python command to run the specific file in Unix. 188) What is unlink in python? This is used to remove the file from the specified path. 189) Program structure in python? Always import the package and write the code without indention 190) Pyramid vs Django? Both used for larger application and Django comes with a ORM framework. 191) Cookies in python? Sessions are known as cookies here it is used to reuest from one object to other. 192) Different types of reuest in python? Before reuest – it is used to passes without the arguments. After reuest – it is used to pass the reuest and response will be generated. Tear down reuest – it is used as same as past but it does not provide response always and the reuest cant be changed. 193) How is fail over mechanism works in python? Once the server shoots the fail over term then it automatically tends to remove the packet each on the solid base and then re shoot again on its own. Socket wont get removed or revoked from the orgin. 194) Dogpile mechanism explain? Whenever the server host the service and when it gets multiple hits from the various clients then the piles get generated enormously. This effect will be seems as Dogpile effect. This can be captured by processing the one hit per time and not allowed to capture multiple times. 195) What is CHMOD 755 in python? This will enhance the file to get all the privileges to read write and edit. 196) CGI in Python? This server mode will enable the Content-type – text/html\r\n\r\n This has an extension of .cgi files. This can be run through the cgi command from the cmd prompt. 197) Sockets explain? These are the terminals from the one end to the other using the TCP, UDP protocols this reuires domain, type, protocol and host address. Server sockets such as bind, listen and accept Client socket such as connect. 198) Assertions in python? This is stated as the expression is hits when we get the statement is contradict with the existing flow. These will throw the error based on the scenario. 199) Exceptions in python? This is as same as JAVA exceptions and it is denoted as the try, catch and finally this also provides the user defined expression. 200) What made you to choose python as a programming language? The python programming language is easy to learn and easy to implement. The huge 3rd party library support will make python powerful and we can easily adopt the python 201) what are the features of python? The dynamic typing Large third party library support Platform independent OOPs support Can use python in many areas like machine learning,AI,Data science etc.. 202) How the memory is managed in python? The private heap space is going to take care about python memory. whenever the object is created or destroyed the heap space will take care. As a programmer we don’t need to involve in memory operations of python 203) What is the process of pickling and unpicling? In python we can convert any object to a string object and we can dump using inbuilt dump().this is called pickling. The reverse process is called unpicling 204). What is list in python? A list is a mutable seuential data items enclosed with in and elements are separated by comma. Ex: my_list=] In a list we can store any kind of data and we can access them by using index 205) What is tuple in python? A tuple is immutable seuential data element enclosed with in () and are separated by comma. Ex: my_tuple=(1,4,5,’mouli’,’python’) We use tuple to provide some security to the data like employee salaries, some confidential information 206) Which data type you prefer to implement when deal with seuential data? I prefer tuple over list. Because the tuple accessing is faster than a list because its immutability 207) What are advantages of a tuple over a list? We can use tuple as a dictionary key because it is hash able and tuple accessing very fast compare to a list. 208) What is list comprehension and dictionary comprehension and why we use it? A list comprehension is a simple and elegant way to create a list from another list. we can pass any number of expressions in a list comprehension and it will return one value, we can also do the same process for dictionary data types Data= Ex: new_list = 209) What is the type of the given datatype a=1? a)int b)Tuple c)Invalid datatype d)String Ans:b 210) Which is the invalid variable assignment from the below? a)a=1,2,3 b)The variable=10 c)the_variable=11 d)none of the above Ans:b 211) Why do we use sets in python? Generally we use sets in python to eliminate the redundant data from any data. And sets didn’t accept any mutable data types as a element of a set Ex: my_set={123,456,’computer’,(67,’mo’)} 212) What are the nameless functions in python? The anonymous functions are called nameless functions in python. We can also call it as lambda function. The lambda functions can be called as a one liner and can be created instantly Syntax: lambda arguments: expression Ex: hello=lambda d:d-(d+1) To call the lambda function Hello(5) 213) What is map and filter in python? Map and filter are called higher order functions which will take another functions as an argument. 214) What is the necessity to use pass statement in python program? Pass is no operation python statement. we can use it while we are implementing the classes or functions or any logic. If class is going be define later in the development phase we can use pass statement for the class to make it syntactically make it valid. Ex: def library(): Pass 215) What is *kwargs and **kwargs? Both are used in functions. both are allowed to pass variable number of arguments to a function only difference is *kwargs is used for non-key word arguments and **kwargs is used for key word arguments Ex: def kwargs(formal_arg, *kwargv): print(“first normal arg:”, formal_arg) for arg in kwargv: print(“another arg through *argv:”, arg) kwargs(‘mouli’, ‘ramesh’, ‘rajesh’, ‘kanna’) 216) Explain about negative indexing? Negative indexing is used in python seuential datatypes like list,string,tuple etc We can fetch the element from the back with out counting the list index Ex: list1 217) What is file context manager? To open a file in safe mode we use WITH context manager. This will ensure the file crashing from some exceptions. we don’t need to close the file explicitly Ex: with open(‘sample.txt’,’w’) as f: Pass 218) Explain between deep and shallow copy? The deep copy , copy the object with reference so that if we made any changes on the original copy the reference copy will be effected, shallow copy ,copy the object in a separate memory so that if we do any changes on original it won’t effect the shallow copy one 219) How can you make modules in python? First we need to save the file with somename.py Second import the somename.py in the newfile.py, so that we can access the somename.py functions in the newfile.py. so that somename.py acts as a module. Even we can share our module to the rest of the world by registering to PYPY community 220) Explain about default database with python? SLite3 comes with python3. It is light weight database for small scale of application 221) What are different modes in file operations? There are 3 modes in python file operations read, write and append sometimes we can do both at a time. read(),readline(),readlines() are the inbuilt functions for reading the file write() is inbuilt function for writing to the file 222) What is enumerate() explain its uses? Enumerate is a built in function to generate the index as we desired in the seuential datatypes Ex: for c ,i in enumerate(data,p): Print(c,i) Here p is optional if we don’t want it we can eliminate it 223) Can we use else with for loop in python? Yes we can use. once all the for loop is successfully executed the else part is going to execute, If there are any error occurs or any break happened in the loop then the else is not going to execute Ex: for I in list1: print(i) Else: print(execution done) even we can use else with while also 224) What is type() and id() will do? The type() will give you the information about datatype and id() will provide you the memory location of the object 225) What is decorators? The decorators are special functions which will very useful when tweaking the function or class.it will modify the functionality of another function. 226) Explain about different blocks in exception handling? There are three main blocks in python exception handling Try Except Finally In the try block we will write all the code which can be prone to error, if any error occurred in this block it will go to the except block. If we put finally block also the execution will hit the finally block. 227) Explain inheritance in python? Inheritance will allow the access to the child call meaning it can access the attributes and methods of the base. There are many types in the inheritance Single inheritance: in this one, have only one base class and one derived class Multilevel inheritance: there can be one or more base classes and one more derived classes to inherit Hierarchical: can derive any number of child classes from single base class Multiple: a single derived can be inherited from any number of base classes 29.write sorting algorithm in python for given dataset= using list comprehension x= print(x.sort()) 228) Explain about multi-threading concept in python? Multi-threading process can be achieved through the multiprocess inbuilt module. GIL(global interpreter lock ) will take care about the multiprocessing in python. simultaneously there are several threads can be run at same time. The resource management can be handled by GIL. 229) Can we do pattern matching using python? Yes, we can do it by using re module. like other programming languages python has comes with powerful pattern matching techniue. 230) What is pandas? Pandas is data science library which deal with large set of data. pandas define data as data frame and processes it. Pandas is a third party library which we need to install. 231) What is pip? Pip is a python package installer. Whenever we need third party library like paramiko,pandas etc We have to use pip command to install the packages Ex: pip install paramiko 232) What is the incorrect declaration of a set? a)myset={} b)myset=set() c)myset=set((1,2,3)) d)myset={1,2,3} Ans:a 233) What is OS module will do in python? OS module is giving access to python program to perform operating system operations like changedirectory, delete or create. Ex: import os os.cwd() 234) What is scheduling in threading? Using scheduling we can decide which thread has to execute first and what is the time to execute the thread. And it is highly dynamic process 235) What is the difference between module and package? A package is folder which can have multiple modules in it. We can import module by its package name.module name 236) How we can send email from python? We can use smtplib inbuilt module to define smtp client, that can be used to send email 237) What is TKIner? TKIner is a python inbuilt library for developing the GUI 238) How can you prevent abnormal termination of a python program We can prevent the abnormal termination by using the exception handling mechanism in python. Try , except and finally are the key words for handling the exception. we can raise our own exceptions in the python. They are called user exceptions 239) what module is used to execute linux commands through the python script and give us with one example We can use OS module to execute any operation system commands. We have to import the OS module first and then give the commands Ex: import os Print(os.system(‘nslookup’+’127.10.45.00’)) 240) what is the process to set up database in Django First we need to edit the settings.py module to set up the database. Django comes with SLite database by default, if we want to continue with default database we can leave settings.py as it is. If we decide to work with oracle or other kind of databases like oracle your database engine should be ‘django.db.backends.oracle’. if it is postgresl then the engine should ‘django.db.backends.postgresl_psycopg2’. We can add settings like password, name host etc. 241) what is Django template A django template is a simple text file which is used to create HTML,CSV or XML. A template contains variables that is replaced with values when we evaluates it 242) what is the uses of middleware in Django? Middleware is responsible for user authentication, session management . 243) what is Django architecture Django architecture contains models ,views, templates and controller The model describes the database schema and data structure. the views retrieves data from model and pass it to the template. Templates are described how the user see it. controller is the logic part and heart of the Django 244) List some of the data science libraries in python NumPy Pandas SciPy Matplotlib 245) How do substitute a pattern in a string using re module Import re >>> re.sub(‘’, ‘o’, ‘Space’) ‘Spooe’ >>> re.sub(‘’, ‘n’, re.sub(‘’, ‘o’, ‘Space’)) ‘Spoon’ 246) What is random module will do in python and what are the functions we can apply on random module Random module will gives the random number from the specific range. Every time we execute we will get the random number Randrange() Randint() Choice() Shuffle() Uniform() Are some of the useful functions in random module 247) What are the noted modules of python in terms of networking Paramiko, netmiko, pexpect etc These module will create a ssh connection between server and the program 248) What is beautifulSoup module will do in python? We are using the module for pulling the data from HTML and XML files 249) What is reuests module will do? It is a python http library. The aim of the reuests module is to make http reuests simpler and more human friendly Ex: Import reuests r = reuests.get(‘https://api.github.com/user’, auth=(‘user’, ‘pass’)) r.status_code 200 >>> r.headers ‘application/json; charset=utf8’ >>> r.encoding ‘utf-8′ >>> r.text # doctest: +ELLIPSIS u'{“type”:”User”…’ >>> r.json() # doctest: +ELLIPSIS {u’private_gists’: 419, u’total_private_repos’: 77, …} 250) What are the basic datatypes in python? Python datatypes include int, float, strings, lists, tuples, sets, dictionaries. 251) How Manages to Python Handle Memory Management? Python is a separate on heaps to keep its memory. So the heap contains all the Python information and these data structures. And it’s the Python created handler that manages the Individual heap. Python employs a built-in garbage receiver, which salvages all the available memory including offloads it to some heap space. 252) What is means by string Python? A string in Python is a mixture of the alpha-numeric volume of characters. They are clear of objects Volume. It suggests that both don’t help move once all get assigned a value. Python provides to programs of join(), replace(), or split() to alter strings variable. 253) What does the meaning of Slicing in python? Python Slicing is defined as Lists of Tuples and Arrays Volume function. The Lists element function has a default bit fo the functionality while slicing. If there is a no conseuence of before that first colon, it expects to begin at the start index of the list. 254) Definition of %S In Python? Python it has to been guide for formatting of any value into a string volume function. It may include uite complex characters. It’s One of the popular usages of the start contents into a string including the %s form specifier. The %S formatting helps Python in a similar variable volume syntax as the C function printf(). 255) what does a function of python programming? A function is an object which describes a block of the system and is a reusable object. It takes modularity to a performance code program and a higher level of code reusability. Python has to give us several built-in functions Volume such as print() function volume and it gives the ability to perform a user-defined function. 256) How to write a functioning volume for python? Step-1: To begin the function Volume of start writing the function with the keyword and then specify the Volume function name. Step-2: We can immediately give the reasons and enclose them doing the parentheses. Step-3: After pushing an enter, we can do it determine the coveted Python records for execution. 257) What is means by Call function in Python? A python function value gets treated because of a callable object. It can provide any thoughts value and also pass a value or increased values into the model of a tuple. Apart from this function, Python should additional constructs, such as being groups or the class instances fit in the related category. 258) How to use of return keywords in python? The purpose of a value function get the inputs and return value of some output. The return value of is a Python statement if it’s we can relate to using for sending content following to its caller. 259) What is meant by“Call By Value” In Python? In call-by-value, that argument to be valued expression or value becomes connected to the particular variable in this function. Python command treats that variable being confined within the function-level field. Any changes done to this variable will continue local and order reflect outside the function. 260) What does means by “Call By Reference” In Python? The Call-by-reference we pass as an argument by reference volume, then it is possible because of an absolute source on the use, first then some simple copy. In such any case, any change to the discussion instructions further is obvious to the caller. 261) Difference between Pass and Continue In Python? The continue report executes the loop value to return from the following emphasis. On the opposite, that passing record instructs to make nothing, including the remainder from the code effects as usual. 262) What is meant by R strip() In Python? Python gives the r-strip() system to increases this string value function but allows avoid this whitespace symbols of that end. This r-strip() transmits that numbers value function of right end based upon particular argument value a string specifying the group of numbers to get excluded. 263) What does defined by whitespace in python? Whitespace is representing the characters string value function that we practice for spacing moreover separation. They maintain the“empty” value function symbol. In Python, it could move some tab or space. 264) What is defined Isalpha() In Python? Python has been provided that built-in isalpha() value function for each string manipulating purpose. It reflects the True value function if all types in this string value function are of alphabet type number, else value function it returns False. 265) What does making the CPython Different From Python? Jython means an implementation from some Python programming language that can operate code using on this Java platform. Jython is uiet as compared to CPython and reuires agreement with CPython libraries. A Python implementation is written in C# getting a Microsoft’s .NET framework. 266) Which is the package Fastest Form Of Python? PyPy gives maximum agreement while utilizing CPython implementation as increasing its performance. The tests verified that PyPy is almost five times faster than uniue CPython. 267) What does the meaning of GIL In Python Language? Python is helped to GI(thats means by the global interpreter) which operates some mutex done to ensure introduction into Python objects, synchronizing multiple threads of running these Python bytecodes at the same time. 268) How do Python Thread Safe? Python ensures the reliable path of the threads. It does this GIL mutex to secure synchronization. If a thread fails the GIL lock through any time, when you must to get this system thread-safe. 269) How Does determine the Python Manage The Memory? Python performs a property manager within which operates any of its articles also data structures. This heap manager makes that allocation/de-allocation from heap space to objects. 270) What is a means by “Tuple In Python”? A tuple is a group of specific data structure under Python is immutable. They mean similar to progressions, really prefer the lists. Also, that tuples follow parentheses as including, but these programs have suare sections in their syntax. 271) What does means by split do in Python? This is the opposite of order which mixes or combines strings within one. To do this, you practice this split function value. What it takes is divided or develop up a string and attach that data into each order collection using a specified separator. If none separator is specified while you charge against specific function, whitespace order signify done by default. 272) How do you convert a string to in python? Use the “int” String function value to convert the number to an integer value. Add five value to the integer. Then, the “str” function value it’s to converts the integer function value to a string value function that Python concatenates and print the output value of the answer. 273) How do you reverse any string in Python? This is continued the reverse value function part syntax. It goes outcomes too by doing – by leaving start value and end off value and defining a step of -1, it reverses value function a string function. 274) What does by Python a scripting language? Python is identified as a scripting language because it is an interpreted language also because that is simple to record scripts in it. A defined python communication programming is a language whose programs have to be obtained before they can be run. 275) What language is Python based on? Since largest recent OS continue written in C, compilers/editors before improved high-level languages exist also written in C. Python continues an exception – its various popular/”traditional” implementation means described CPython more is written in C. 276) What is the best free website to learn Python? Python.org. is one the best Python Software Foundation’s official website is further one of the valuable free source locations.SoloLearn- If it refers to a modular, crash-course-like information environment, SoloLearn gives an excellent, step-by-step knowledge program for beginners, TechBeamers , Hackr.io, Real Python. 277) Difference between Python and Java? The Two biggest difference languages signify that the Java is one the statically typed program coding language and Python is one of the dynamical typed. Python is very heavily code programming language but dynamically typed. In certain means types in one code remain confined to strongly Copied something at runtime. 278) How Can you declare the variables function in Python? In Java or C, every variable must be certified before it can be used. Declaring the variable means connecting it to a data type value function. Declaration of variables is expected in Python. You can specify an integer value function to a variable, use it is an integer value function for a while and when specifying a string to the variable function. 279) How to declare the variables function in Python? Python is defined as a dynamically typed variable, which indicates that you have to declare what type each function variable is. In Python, variables do a storage placeholder during texts and numbers variable. It needs to convert one name so that you remain ualified to get this again. The variable does forever assign with an eual sign, replaced by the value of the variable function. 280) How do you reverse the string in python? There is no such inbuilt function for this. The Easiest way for reversing the string in python is using slice which steps backwards, -1. For example: txt = “Hello World” print(txt). 281) WAP to find the given string in the line? This is the WAP for finding the given string in line. Str = ‘Hello world’ If ‘hello’ in str: Print ‘string found’. 282) What is class variable in python? The Class variable are also known as static variables. These variables are shared by all objects. In Python the variables that are assigned the value in class declaration are known as class variables. 283) What is class in Python? The python is “object oriented language”. Almost all the codes of this language are implemented using a special construct called Class. In simple words, “Class” is an object constructer in Python. 284) How can you handle multiple exception in python? To handle multiple exception in python you can use try statement. You can also use these blocks: The try/except blocks The finally blocks The raise keywords Assertions Defining your own exception 285) Can we write else statement try block in python? Yes, it is possible to write else statement try block. try: operation_that_can_throw_ioerror() except IOError: handle_the_exception_somehow() else: # we don’t want to catch the IOError if it’s raised another_operation_that_can_throw_ioerror() finally: something_we_always_need_to_do(). 286) Does Python have do-while loop statements? No, Python doesn’t have any do-while loop statements. 287) What is the difference between range and xrange in Python? In python the range and xrange are two functions that are used repeat number of time in for loops. The major difference between rang and xrange is that the xrange returns the xrange object while the range returns a python list objects. The xrange is not capable for generating the static list at run-time. On the other hand range can do that. 288) Is it possible to inherit one class from another class? Yes, we can inherit one class from another class in python. 289) Name different types of inheritance in python? The inheritance refers to the capability of on class to derive the properties from other class. In python, there are two major types of inheritance. Multiple Inheritance Multilevel Inheritance 290) What is polymorphism? The polymorphism in python refers to various types of respond to the same function. In Greek language the word poly means “many” and morphism means “forms”. This means that the same function name is being used on objects of different types. 291) How do you convert string as a variable name in python? The simplest way to convert string as a variable name is by using vars(). 292) Why do we want to use break statement in while-loop? While-loop can convert into the infinite loop if you don’t use break statement. 293) Why we are using Def keyword for method? The Def keyword in python is used to form a new user-defined function. The def keywords mark the beginning of function header. The functions are the objects through which one can easily organize the code. 294) Why are we using self as first argument? The first argument represents the current instance of the class. The first argument is always called self. With the use of “self” keyword one can easily access the characteristics and methods of the class in python. 295) Why we are using a Python Dictionary? There is huge collection of data values in the python dictionary. These dictionaries are accessed to retrieve the value of the keys that unknown to the users. There is a key: value pair provided in the dictionary which makes it more optimized. 296) What are the use of tuples in Python? A tuple in python is a series of immutable Python objects. These tuples are similar to the list that are used for organizing data to make it easier to understand. If Python has created a tuple in memory, it difficult to change them. 297) What are the use of sets in Python? The Python Set is the collection objects similar to lists and dictionaries. All the elements should be original and uniue and must be immutable. The python sets in comparison with list provides highly optimized method for ensuring whether a specific element is contained in the set. 298) Does Python supports hybrid inheritance? No, python doesn’t support hybrid inheritance. But we can use straight method and round diamond method we can achieve it. 299) What is the uses of middleware in Django? Middleware is responsible for user authentication, session management . 300) Explain Deep Copy in Python There are some values copied already. To store those copied values, Deep copy is used. Unlike Shallow copy, Deep copy will not copy the reference pointers. 301) Define the usage of split If you want to separate a provided string in Python, use split() function. 302) What is the keyword to import a module in Python? Use the keyword ‘import’ to import the modules in Python. 303) List out the different types of inheritance available in Python Hierarchical inheritance, Multi-level inheritance, Multiple inheritance, and Single Inheritance are the four types inheritance available in Python. 304) Define monkey patching You can make dynamic modifications to a module or class during the run-time. This process is called monkey patching in Python. 305) Explain encapsulation Binding the data and code together is known as encapsulation. Example of encapsulation is a Python class. 306) Define Flask in Python Flask, a microframework principally constructed for a minor application with easier reuirements. External libraries must be used in Flask and flask is always ready to use state. 307) Define Pyramid in Python For larger application, you can make use of Pyramid and this is hefty configurable concept. Pyramid affords suppleness and permits the developer to employ the appropriate tools for their assignment. 308) Define Django in Python Similar to Pyramid, Django is built for larger applications and ORM is included. 309) Provide the Django MVT Pattern Django Pattern 310) Why to use Python numpy instead o f lists? Python numpy is convenient, less memory and rapid when compared to lists. Hence, it is better to use python numpy. 311) Mention the floor division available in Python Double-slash (//) is the floor division in Python. 312) Is there any maximum length expected for an identifier? No, there is no maximum length expected for an identifier as it can have any length. 313) Why do we say “a b c = 1000 2000 3000” is an invalid statement in Python? We cannot have spaces in variable names and hence a b c = 1000 2000 3000 becomes invalid statement. 314) Mention the concept used in Python for memory managing Python private heap space is the one used to manage memory. 315) What are the two (2) parameters available in Python map? Iterable and function are the two (2) parameters available in Python map 316) Explain “with” statement in Python As soon as there is a block of code, you can open and close a file using “with” statement in Python. 317) What are the modes to open a file in Python? read–write mode (rw), write-only mode (w), and read-only mode (r) is the three (3) modes to open a file in Python. 318) Try to provide the command to open a file c:\welcome.doc for writing Command to open a file for writing f= open(“welcome.doc”, “wt”) 319) Explain Tkinter in Python An inbuilt Python module helpful in creating GUI applications is known as Tkinter. 320) What does the keyword do in python? The yield keyword can turn ant function into a generator. It works like a standard return keyword. But it will always return a generator object. A function can have multiple calls the keyword. Example: def testgen(index): weekdays = yield weekdays yield weekdays day = testgen(0) print next(day), next(day) Output: Sun mon PYTHON Interview Questions with Answers Pdf Download Read the full article

0 notes

bionicly-blog · 8 years ago

Text

Data Insights Case Study Part 2 - More Preprocessing

In the last couple posts, we’ve:

Identified the problem / hypothesis to go after.

Figured out the data set to work with.

Figured out a subset of the data to load into a relational DB (and structured in a roughly “standardized” format).

Connected the data in the relational DB to frontend Python and showing the data as a Pandas dataframe.

Now we have to do more work. If you even give a cursory look at edu_indicators_f and econ_indicators data (or the tables in the relational DB), you’ll notice how the data is uneven and not ready for an algorithm like logistic regression to operate on them:

Different potential parameters and potential target variables have different volumes in the data

There are a LOT of such parameters and target variables (132 economic indicators and 23 education indicators)

The years have different ranges for different parameters/variables (although most end around 2014).

The values have a wide range, and are likely not comparable and would distort algorithms that try to extract insight out of them.

For example, run these queries:

select distinct indicatorcode, count(*) as volume from econ_indicators group by indicatorcode order by volume desc;

select distinct indicatorcode, count(*) as volume from edu_indicators_f group by indicatorcode order by volume desc;

select distinct(indicatorcode), min(year), max(year) from econ_indicators group by indicatorcode

select distinct(indicatorcode), min(year), max(year) from edu_indicators_f group by indicatorcode

select distinct(indicatorcode), min(val), max(val) from edu_indicators_f group by indicatorcode

You’ll see some results such as:

The challenge, then, is what do we do about it. There are many ways to tackle this but this is how I approached it:

First, the target variables. There are 132 economic indicators, so it’s not very practical to try figure out the relationship between the education factors and ALL the economic indicators. We’ll have to filter down to a few candidates that covers the most number of countries and the most number of years:

select distinct(indicatorcode), count(distinct(countrycode)) as numcountries, count(distinct(year)) as numyears from econ_indicators group by indicatorcode order by numcountries desc, numyears desc;

Obviously, some intuition/judgement need to be made here to pick the top features, or it could be done through a threshold (e.g., numcountries >= 230). For purposes of this initial data exploration and not having to write 2313414 posts, we’ll stick to NY.GDP.MKTP.CD for now, as the target variable, and can always revisit later.

create table econ_target as select indicatorcode, countrycode, year, val from econ_indicators where indicatorcode = 'NY.GDP.MKTP.CD'

Second, for the features (educational indicators), we don’t need all of them either. Because it’s a much smaller set, we can probably use downstream algorithms like PCA to figure out which subset best explains the values in the target variables. But for sake of this initial exploration (and again, not having to write 2313414 posts ;)), we’ll arbitrarily limit it to at least 230 countries and 45 years. We can always come back and all the rest back in for more in-depth exploration.

select distinct(indicatorcode), count(distinct(countrycode)) as numcountries, count(distinct(year)) as numyears from edu_indicators_f group by indicatorcode order by numcountries desc, numyears desc

create table edu_candidates as select indicatorcode, countrycode, year, val from edu_indicators_f where indicatorcode in ('SE.PRM.ENRL.FE.ZS', 'SE.SEC.ENRL.GC.FE.ZS', 'SE.SEC.ENRL.FE.ZS', 'SE.PRM.TCHR.FE.ZS', 'SE.SEC.ENRL.VO.FE.ZS', 'SE.SEC.TCHR.FE.ZS', 'SE.SEC.TCHR.FE')

Third, for both education and economic indicators, we don’t need the country name or the indicator name, nor the ID columns. Rather, we can drop these and rely on the code columns. When we need to identify what the actual variables are, we can always use the original table mappings to retrieve the full names. To make downstream (e.g., Python side) processing easier, let’s consolidate both candidate feature and target variables in one table and join on the country and years. Also, since we don’t care about what the target variable is called (it’s always 'NY.GDP.MKTP.CD' in our case), we can drop that as well:

create table ed_ec_trans as select ed.year as yr, ed.countrycode as country, ed.indicatorcode as edu_ind, ed.val as edu_val, ec.val as ec_val from econ_target as ec, edu_candidates as ed where ed.countrycode = ec.countrycode and ec.year = ed.year;

Fourth, we need to read it into the Python side (using Pandas), and then reformat it so that all the variables in edu_ind are turned into columns (features), to make it easier to process downstream. If you need a primer on how pivot tables work in Pandas, check this out. Now, assuming you’re using InsightTools class that we defined in the previous post, replace the credentials with your own:

def main():

iTools = InsightTools('localhost', '<your database>', '<your user>', '<your password>'); df = iTools.fetch_data_as_pd('select * from ed_ec_trans limit 200');

dfp = pd.pivot_table(df,index=['yr', 'country', 'ec_val'], columns=['edu_ind'], fill_value=0, aggfunc=‘sum’)

dfp.to_csv('dfp.csv')

Note the limit 200 and the CSV file. This is so that you can quickly test and check whether the format of the data is what you expect...and while you can print to screen to check, with large dataframes (even with something like Python’s tabulate) it becomes harder to read. But limit the results and export to CSV and then you can view it nicely in Excel or Libreoffice (if you’re on Linux like me). Also, don’t forget the aggfunc=‘sum’ since by default, pivot_table will try to aggregate by averaging the values...and while that may be ok, we don’t want to do computations in our data just yet as we’ll do quite a bit of these in the next post.

Now, If you look closely at the CSV file as shown in Libreoffice, you’ll see how ec_val is in the index section as a level (our target variable) and all the edu_val variables on the right of it are the feature columns that we’ll deal with in the next post.

Before we can use the dataframe further, we need to pull the target variable out of the index and onto its own column:

dfp = dfp.reset_index(level=['ec_val'])

If you export to CSV again, you’ll see how ec_val is now its own column and not part of the index:

We can leave the year and country as part of the index.

Fifth, we need to scale the values in the feature and target columns, as we don’t want certain features to otherwise distort the downstream algorithms. There are many ways to scale, but to start with (and keeping in mind we may do PCA later on), we’ll use standardization. And we’ll also move up the 2nd-level education column names to first level and name the economic indicator variable (the ‘y’) variable as well, so we can more easily refer to all these later:

from sklearn.preprocessing import StandardScaler;

scaler = StandardScaler()

dfp_std = pd.DataFrame(scaler.fit_transform(dfp), columns = dfp.columns)

dfp_std.to_csv('dfp_std.csv')

dfp_std.columns = [col[1] for col in dfp_std.columns]

dfp_std.columns.values[0] = 'ec_ind'

You’ll see the the dataframe that’s almost ready for downstream processing when you open the CSV file:

Finally, did you notice something we did earlier that we should undo? ;) If you’ve been paying close attention, we need to remove the “limit 200″ and then run the script again, to pull in and preprocess the entire dataset:

df = iTools.fetch_data_as_pd('select * from ed_ec_trans');

In the next post, we’ll start the fun stuff to actually clean insight from the preprocessed data.

#streambionicly #data

0 notes

just4programmers · 8 years ago

Text

Best Python Machine Learning Libraries

In this article I am going to share some popular and best python machine learning libraries.

I will advise you to go through Introduction to Machine Learning article (an introductory blogpost) to get better insights as we move further.

Here we will be focusing on some of the cool packages and libraries that we can use during our project life cycle in Machine Learning.

Best Python Machine Learning Libraries

Guys, primarily we need to opt a language for our journey with ML from R & Python, so based on the public interest and keeping various other factors in mind we will be continuing our rest of the session with Python as a language.

Image Source

Here we are going to discuss about some of the basic Python machine learning libraries and packages that some of you might have used during your projects and on the other hand some of the packages that are specific and beneficial for Machine Learning. So let’s start with discussing the importance of the packages and what functionalities do they have to offer.

NumPy

NumPy (stands for Numerical Python) is one of the most famous and commonly used python package among data scientists and ML engineers. This is a part of Python’s SciPy Stack, which is basically a collection of software specially designed for scientific computations. However the stack mentioned above is pretty vast. In this post we’ll focus on some of the essential libraries pertaining to python.

Talking about NumPy, it provides several features to work with n-dimensional arrays and matrices in python. This library provides vectorization of mathematical operations on the NumPy array type which adds up to the performance of the execution.

Pandas

The Pandas library is too a well-known library in the world of Analytics and Data Sciences. This package is primarily designed to work with simple and relational data. This is one of the favorite libraries among the data scientists for easy data manipulation, visualization as well as aggregation.

If talking about the data structures, there are basically two prime data structures available in the library which are Series (one-dimensional) & Data Frames ( two-dimensional) and we think these are not that significant to talk about as of now.

Let’s see some of the basic functionalities that Pandas has to offer:

We can very easily delete as well as add a columns from DataFrame

Pandas can be used to convert the Data Structures in to DataFrame objects.

If we have any redundancy in the dataset in the form of missing data represented as ‘NaN’, this is the perfect tool to remove that

Can be used for grouping of the attributes based strictly on their functionality.

SciPy

This is a SciPy library, do not get confused with SciPy Stack that we have mentioned earlier. SciPy is a library that contains modules for Liner Algebra, Statistics, Optimization & Integration. This fact cannot be denied that the main functionality of SciPy is built upon NumPy.

The purpose mentioned above like statistics, optimization is served by this library with the help of its specific sub-modules (in which the functions are well documented).

Note: These three libraries that we’ve mentioned above are the core libraries, i.e. they can be frequently used in the python programming as well as for highly specific tasks like Data Analysis and Machine Learning.

Let us now see some of the more great libraries that add up to the beauty of python when working with data.

Libraries for Data Visualization

Matplotlib

Seaborn

Bokeh

Plotly

These are the libraries that are frequently being used in Data Sciences preferably for data visualization. We do not need to explain them right now but will be using in the script whenever required.

Libraries for Machine Learning

Scikit-Learn

Keras

Theano

TensorFlow

Libraries for Natural Language Processing

NLTK (Natural Language Toolkit)

Gensim

Libraries for Data Mining & Statistics

Scrapy

Statsmodels

These are some of the most familiar machine learning libraries in python that are being preferred and used by the data scientists and engineers. You can also find some other packages/libraries useful depending on your needs.

Let’s see now what Github has to say for the use of different libraries throughout:

Bonus Tip: As some of our readers might be working on their ML projects with the help of R language. So here I am attaching a screenshot for them to choose a best performing package based on downloads.

Image Source

Alright guys that is all for today. We hope you enjoyed learning with us. We will be coming with such articles on regular intervals. Stay tuned.

The post Best Python Machine Learning Libraries appeared first on The Crazy Programmer.

#The Crazy Programmer

0 notes

dada-data-blog · 8 years ago

Text

Machine Learning for Data Analysis: LASSO Regression (python)

SUMMARY

LASSO stands for "Least Absolute Selection and Shrinkage Operator". It is a supervised machine learning method that can be useful for reducing a large set of predictor explanatory variables to a smaller set of the the most accurate predictors of the target variable. For those variables that the LASSO method determines are the least significant predictors, the coefficient is reduced to 0 and the coefficients for the variables that are the better predictors is calculated.

I added the additional explanatory variables of Carbon Dioxide Emissions and Suicide per 100th to those used for the last assignment for a total of 12 explanatory variables. All variables are from the Gapminder dataset provided by Coursera.

Target Variable: incomegrp. Gross Domestic Product per capita in constant 2000 US$. This is a binary categorial variable created from ‘incomeperperson’. 0=lower income, 1 = higher income

Predictor/Explanatory Variables:

For this assignment I used the original quantitative versions of these variables from the Gapminder dataset. I previous assignments I changed them to binary categorical variables.

armedforcesrate: Armed forces personnel (% of total labor force) alcconsumption: alcohol consumption per adult femaleemployrate: female employees (% of population) hivrate: estimated HIV Prevalence % - (Ages 15-49) internetuserate: Internet users (per 100 people) relectricperperson: residential electricity consumption, per person (kWh) polityscore: Democracy score (Polity) urbanrate: urban population (% of total) lifeexpectancy: 2011 life expectancy at birth (years) employrate: total employees age 15+ (% of population) suicideper100TH: suicide per 100 000 co2emissions: cumulative CO2 emission (metric tons)

RESULTS

The results below indicate that the following variables are the least effective predictors because their coefficient was was reduced to “0″: employrate, armedforcesrate, alcconsumption, suicideper100th, co2emissions.

The most effective predictors because they had the highest coefficent are urbanrate, hivrate, lifeexpectancy and internetuserate.

Compared to the results of the last assignment, which used Random Forest, the most effective predictors determined by the Random Forest and LASSO are different, but the least effective predictors that were included in both tests are the same: employrate, armedforcesrate, alccosumption. So it appears that LASSO could be useful when you have a very large set of predictor variables that you want to reduce to the most significant predictors.

The Mean Square Error (MSE) rate for the training data was .092 and for the test data was .061 which is reflected in the R-square values of 0.63 and 0.76, meaning that training data explained 63% and the test data 76% of the variance in predicting lower and higher income. It is preferable to have the MSE and R-square values for training and test data to be more consistent, and I’d like to do more research to determine why they aren’t closer in value.

Coefficients:

{'urbanrate': 0.7209398362572198, 'employrate': 0.0, 'lifeexpectancy': 0.41845865663194542, 'armedforcesrate': 0.0, 'polityscore': 0.14381892362581544, 'relectricperperson': 0.15238876795635295, 'internetuserate': 0.40935103458276811, 'femaleemployrate': -0.047801254916007535, 'alcconsumption': 0.0, 'hivrate': 0.59565425188405363, 'suicideper100th': 0.0, 'co2emissions': 0.0} training data MSE 0.0918224301364 test data MSE 0.0606194488686 training data R-square 0.632441792817 test data R-square 0.75550155623

CODE

# -*- coding: utf-8 -*- """ 07-29-17

@author: kbolam """

# -*- coding: utf-8 -*-

from pandas import Series, DataFrame import pandas import numpy as np import matplotlib.pylab as plt from sklearn.cross_validation import train_test_split from sklearn.linear_model import LassoLarsCV

""" Machine Learning for Data Analysis Random Forests """

# bug fix for display formats to avoid run time errors pandas.set_option('display.float_format', lambda x:'%.2f'%x)

data = pandas.read_csv('gapminderorig.csv')

# convert all variables to numeric format data['incomeperperson'] = pandas.to_numeric(data['incomeperperson'], errors='coerce') data['urbanrate'] = pandas.to_numeric(data['urbanrate'], errors='coerce') data['employrate'] = pandas.to_numeric(data['employrate'], errors='coerce') data['lifeexpectancy'] = pandas.to_numeric(data['lifeexpectancy'], errors='coerce') data['armedforcesrate'] = pandas.to_numeric(data['armedforcesrate'], errors='coerce') data['polityscore'] = pandas.to_numeric(data['polityscore'], errors='coerce') data['relectricperperson'] = pandas.to_numeric(data['relectricperperson'], errors='coerce') data['internetuserate'] = pandas.to_numeric(data['internetuserate'], errors='coerce') data['femaleemployrate'] = pandas.to_numeric(data['femaleemployrate'], errors='coerce') data['alcconsumption'] = pandas.to_numeric(data['alcconsumption'], errors='coerce') data['hivrate'] = pandas.to_numeric(data['hivrate'], errors='coerce') data['suicideper100th'] = pandas.to_numeric(data['suicideper100th'], errors='coerce') data['co2emissions'] = pandas.to_numeric(data['co2emissions'], errors='coerce')

data_clean=data.dropna()

#Change Target variable to binary categorical variable def incomegrp (row): if row['incomeperperson'] <= 3500: return 0 elif row['incomeperperson'] > 3500 : return 1

# added ".loc[:," to "data_clean['armedforcesgrp'] =" to get rid of copy error "Try using .loc[row_indexer,col_indexer] = value instead" # still getting error for another line but all predictors statement seem to be working with update and no more errors data_clean.loc[:,'incomegrp'] = data_clean.apply (lambda row: incomegrp (row),axis=1)

chk2 = data_clean['incomegrp'].value_counts(sort=False, dropna=False) print(chk2)

#select predictor variables and target variable as separate data sets predvar= data_clean[['urbanrate','employrate','lifeexpectancy','armedforcesrate', 'polityscore','relectricperperson','internetuserate','femaleemployrate','alcconsumption', 'hivrate','suicideper100th','co2emissions']]

target = data_clean.incomegrp

# standardize predictors to have mean=0 and sd=1 predictors=predvar.copy()

''' -- Used "MinMaxScalar" function with "fit_transform" instead of "preprocessing.scale" to get rid of "too large values" error below. MinMax scales value to between -1 and 1, I think.: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features. -- Added extra set of [] around predictor on right side to stop predictor from being transformed into numpy array AND to avoid getting Deprecation reshape warning https://stackoverflow.com/questions/35166146/sci-kit-learn-reshape-your-data-either-using-x-reshape-1-1 '''

from sklearn.preprocessing import MinMaxScaler min_max=MinMaxScaler() predictors['urbanrate']=min_max.fit_transform(predictors[['urbanrate']].astype('float64')) predictors['employrate']=min_max.fit_transform(predictors[['employrate']].astype('float64')) predictors['lifeexpectancy']=min_max.fit_transform(predictors[['lifeexpectancy']].astype('float64')) predictors['armedforcesrate']=min_max.fit_transform(predictors[['armedforcesrate']].astype('float64')) predictors['polityscore']=min_max.fit_transform(predictors[['polityscore']].astype('float64')) predictors['relectricperperson']=min_max.fit_transform(predictors[['relectricperperson']].astype('float64')) predictors['internetuserate']=min_max.fit_transform(predictors[['internetuserate']].astype('float64')) predictors['femaleemployrate']=min_max.fit_transform(predictors[['femaleemployrate']].astype('float64')) predictors['alcconsumption']=min_max.fit_transform(predictors[['alcconsumption']].astype('float64')) predictors['hivrate']=min_max.fit_transform(predictors[['hivrate']].astype('float64')) predictors['suicideper100th']=min_max.fit_transform(predictors[['suicideper100th']].astype('float64')) predictors['co2emissions']=min_max.fit_transform(predictors[['co2emissions']].astype('float64'))

predictors=predictors.dropna()

predictors.dtypes predictors.describe()

print(type(predictors)) print(type(target))

# split data into train and test sets. pred_train, pred_test, tar_train, tar_test = train_test_split(predictors, target, test_size=.3, random_state=123)

# specify the lasso regression model model=LassoLarsCV(cv=10, precompute=False).fit(pred_train,tar_train)

# print variable names and regression coefficients print (dict(zip(predictors.columns, model.coef_)))

# plot coefficient progression m_log_alphas = -np.log10(model.alphas_) ax = plt.gca() plt.plot(m_log_alphas, model.coef_path_.T) plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k', label='alpha CV') plt.ylabel('Regression Coefficients') plt.xlabel('-log(alpha)') plt.title('Regression Coefficients Progression for Lasso Paths')

# plot mean square error for each fold m_log_alphascv = -np.log10(model.cv_alphas_) plt.figure() plt.plot(m_log_alphascv, model.cv_mse_path_, ':') plt.plot(m_log_alphascv, model.cv_mse_path_.mean(axis=-1), 'k', label='Average across the folds', linewidth=2) plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k', label='alpha CV') plt.legend() plt.xlabel('-log(alpha)') plt.ylabel('Mean squared error') plt.title('Mean squared error on each fold')

# MSE from training and test data from sklearn.metrics import mean_squared_error train_error = mean_squared_error(tar_train, model.predict(pred_train)) test_error = mean_squared_error(tar_test, model.predict(pred_test)) print ('training data MSE') print(train_error) print ('test data MSE') print(test_error)

# R-square from training and test data rsquared_train=model.score(pred_train,tar_train) rsquared_test=model.score(pred_test,tar_test) print ('training data R-square') print(rsquared_train) print ('test data R-square') print(rsquared_test)

0 notes

lewiskdavid90 · 8 years ago

Text

85% off #Data Analysis with Python & Pandas – $10

Learn Python for data analysis and visualization by analyzing large datasets. Covering Python 3, Pandas, and Seaborn.

All Levels, – 5.5 hours, 70 lectures

Average rating 4.4/5 (4.4 (381 ratings) Instead of using a simple lifetime average, Udemy calculates a course’s star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.)

Course requirements:

A working computer (Windows, Mac, or Linux) No prior knowledge of Python is required

Course description:

This Python course will get you up and running with using Python for data analysis and visualization. You will learn how to handle, analyze and visualize data in Python by actually completing two big data analysis projects, one demonstrated through videos and another laid out through six exercises.

The course assumes you have no prior knowledge of Python, so you also get to learn the basics of Python in the first two sections of the course. However, if you already know Python, the first two sections can serve as a refresher before you jump into the data analysis and visualization part.

In the course you will learn to use Python third-party data analysis libraries such as Pandas, Matplotlib, Seaborn, just to mention a few and tools to boost your productivity such as Spyder and Jupyter.

As you progress through the course, you will be guided step by step on building a program that uses real world data containing hundreds of files and millions of records. You will learn to write Python code that downloads, extracts, cleans, manipulates, aggregates and visualizes these datasets using Python. Apart from following the video screencasts, you will also be required to write your own Python scripts from scratch for completing a data analysis project on income data.

Full details Build 10 advanced Python scripts which together make up a data analysis and visualization program. Solve six exercises related to processing, analyzing and visualizing US income data with Python. Learn the fundamental blocks of the Python programming language such as variables, datatypes, loops, conditionals, functions and more. Use Python to batch download files from FTP sites, extract, rename and store remote files locally. Import data into Python for analysis and visualization from various sources such as CSV and delimited TXT files. Keep the data organized inside Python in easily manageable pandas dataframes. Merge large datasets taken from various data file formats. Create pivot tables in Python out of large datasets. Perform various operations among data columns and rows. Query data from Python pandas dataframes. Export data from Python into various formats such as TXT, CSV, Excel, HTML and more. Use Python to perform various visualizations such as time series, plots, heatmaps, and more. Create KML Google Earth files out of CSV files.

Full details Those who come from any technology field that deals with any kind of data. Those who want to leverage the power of the Python programming language for handling data. Those who need to learn Python basics and want to quickly advance their skills by learning how to perform data cleaning, analysis and visualization with Python – all in one single course. Those who want to switch from programming languages such as Java, C, R, Matlab, etc. to Python.

Full details

Reviews:

“To the point and very fast so relies on good concentration and background with other languages but you can replay if you want to go slower. Information accurate but I would like a bit more context on tricky points but this may slow fast learners down 4 .5 stars (1 only give 5 stars very occasionally)” (Clive Lee)

“I really enjoyed this course. The instructor is very knowledgeable, and explains the material in a very thoughtful way.” (Todd Janczak)

“I am really happy with the course. I’m a GIS Specialist, and this is far better for me than the instructor-led ESRI Python training. I’m really motivated to learn more now thanks to Ardit.” (Grant Huntington)

About Instructor:

Ardit Sulce

Ardit received his master’s degree in Geospatial Technologies from the Institute of Geoinformatics at University of Muenster, Germany. He also holds a Bachelor’s degree in Geodetic Engineering. Ardit offers his expertise in Python development on Upwork where he has worked with companies such as the Swiss in-Terra, Center for Conservation Geography, and Rapid Intelligence. He is the founder of PythonHow where he authors written tutorials about the Python programming language.

Instructor Other Courses:

Interactive Data Visualization with Python & Bokeh Ardit Sulce, Python and GIS Expert, Founder of PythonHow.com (52) $10 $95 The Python Mega Course: Build 10 Real World Applications Python for Beginners with Examples …………………………………………………………… Ardit Sulce coupons Development course coupon Udemy Development course coupon Programming Languages course coupon Udemy Programming Languages course coupon Data Analysis with Python & Pandas Data Analysis with Python & Pandas course coupon Data Analysis with Python & Pandas coupon coupons

The post 85% off #Data Analysis with Python & Pandas – $10 appeared first on Udemy Cupón/ Udemy Coupon/.

from Udemy Cupón/ Udemy Coupon/ http://coursetag.com/udemy/coupon/85-off-data-analysis-with-python-pandas-10/ from Course Tag https://coursetagcom.tumblr.com/post/155983197253

0 notes

lewiskdavid90 · 8 years ago

Text

50% off #Data Analysis in Python with Pandas – $10

Getting an introduction to doing data analysis with the Python pandas library with hours of video and code.

Intermediate Level, – 5 hours, 34 lectures

Average rating 4.5/5 (4.5 (323 ratings) Instead of using a simple lifetime average, Udemy calculates a course’s star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.)

Course requirements:

Students need to have Python installed on their computer. Students should be familiar with basic data analysis concepts. Students should have experience writing, at a minimum, basic programs in python.

Course description:

Ever wonder how you can best analyze data in python? Wondering how you can advance your career beyond doing basic analysis in excel? Want to take the skills you already have from the R language and learn how to do the same thing in python and pandas?

THEN THIS COURSE IS FOR YOU!

By taking the course, you will master the fundamental data analysis methods in python and pandas!

You’ll also get access to all the code for future reference, new updated videos, and future additions for FREE! You’ll Learn the most popular Python Data Analysis Technologies!

By the end of this course:

– Understand the data analysis ecosystem in Python.

– Learn how to use the pandas data analysis library to analyze data sets

– Create how to create basic plots of data using MatPlotLib

– Analyze real datasets to better understand techniques for data analysis

At the end of this course you will have learned a lot of the tips and tricks that cut down my learning curve as a business analyst and as a Master’s Student at UC Berkeley doing data analysis. I designed this course for those that have an intermediate programming ability and are ready to take their data analysis skills to the next level.

You’ll understand cutting edge techniques used by data analysts, data scientists, and other data researches in Silicon Valley.

Complete with working files and code samples, over 5 hours with 40+ lectures you’ll learn all that you need to know to turn around and apply data analysis strategies to the data that you work with. You’ll be able to work along side the instructor as we work through different data sets and data analysis approaches using cutting edge data science tools!

Full details Perform data analysis with python using the pandas library. Understand some of the basic concepts of data analysis. Have used n-dimensional arrays in NumPy as well as the pandas Series and DataFrames to analyze data. Learned the basics of plotting with matplotlib This course is best suited for people that need a deeper understanding of data analysis tools available today. This course is not suited for those that want to learn how to program and have no prior programming experience. This course is great for introductory to intermediate python programmers or those that come from a statistical software background like R or SPSS. Analysts who want to better understand a technical approach to analyzing data. Scientists who want to step away from more academic programming languages and use a general purpose language like python. Programmers who are coming from a technical background but want to understand the pydata ecosystem a bit better. Those that are interested in learning a bit more about data analysis.

Full details

Reviews:

“It is a very basic Numpy and Pandas course. However, I can’t figure out what is the business application / practical application of this course. Where in a business world this knowledge is used? Also, why would I use Pandas instead of SAS Statistical Language? Or why would I use Pandas instead of SQL Queries? I think if I had more business examples I would be able to appreciate Numpy and Pandas more.” (Gale K.)

“Good structured course, however some users use Python 3 so there are some mismatches between Python 2 and 3. Well structured course is highly recommended for data scientists , accountancy departments and back offices. Thank you .” (Andrei Ivantsov)

“The course is very precise and too the point explaining applied concepts and useful functions/methods. The course is very focused and relatively short to complete. I would have liked to see some more theory or deep dive to be explained in two 10 min videos to make this course complete. Also if there are some practical examples or case studies thrown in, that would be great.” (Sujay Mudalgi)

About Instructor:

Bill Chambers

Bill Chambers is currently pursuing a Master’s in Information Management and Systems at the UC Berkeley School of Information. Before pursuing this degree, he focused on data architecture and systems scaling at his last employer. He re-architected the company’s entire internal systems operations including redeploying Salesforce internally, implementing Hubspot’s marketing automation software, and integrating Totango’s customer analytics platform. Bill was also responsible for providing operational metrics through statistical analysis using Python (specifically the pandas data analysis library). After UC Berkeley, Bill hopes to help other businesses improve the way their businesses work through data.

Instructor Other Courses:

…………………………………………………………… Bill Chambers coupons Development course coupon Udemy Development course coupon Software Engineering course coupon Udemy Software Engineering course coupon Data Analysis in Python with Pandas Data Analysis in Python with Pandas course coupon Data Analysis in Python with Pandas coupon coupons

The post 50% off #Data Analysis in Python with Pandas – $10 appeared first on Udemy Cupón/ Udemy Coupon/.

from Udemy Cupón/ Udemy Coupon/ http://coursetag.com/udemy/coupon/50-off-data-analysis-in-python-with-pandas-10/ from Course Tag https://coursetagcom.tumblr.com/post/155961153603

0 notes